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We present a unified axiomatic approach to contextuality and non-locality based on the fact 
that both are resource theories. In those theories the main objects are consistent boxes, which 
can be transformed by certain operations to achieve certain tasks. The amount of resource is 
quantified by appropriate measures of the resource. Following recent paper [J.I. de Vicente, J. 
Phys. A: Math. Theor. 47 , 424017 (2014)], and recent development of abstract approach to 
resource theories, such as entanglement theory, we propose axioms and welcome properties for 
operations and measures of resources. As one of the axioms of the measure we propose the asymptotic 
continuity: the measure should not differ on boxes that are close to each other by more than 
the distance with a factor depending logarithmically on the dimension of the boxes. We prove 
that relative entropy of contextuality is asymptotically continuous. Considering another concept 
from entanglement theory—the convex roof of a measure—we prove that for some non-local and 
contextual polytopes, the relative entropy of a resource is upper bounded up to a constant factor by 
the cost of the resource. Finally, we prove that providing a measure X of resource does not increase 
under allowed class of operations, such as e.g. wirings, the maximal distillable resource which can be 
obtained by these operations is bounded from above by the value of A up to a constant factor. We 
show explicitly which axioms are used in the proofs of presented results, so that analogous results 
may remain true in other resource theories with analogous axioms. We also make use of the known 
distillation protocol of bipartite nonlocality to show how contextual resources can be distilled. 


I. INTRODUCTION 

Quantum contextuality stands among the most expres¬ 
sive manifestations of nonclassicality in quantum me¬ 
chanics 0 , 0 . In recent years it has attracted much 
attention and has been a topic of extensive studies 0- 
@. Apart from the interest focused on fundamental con¬ 
cepts, quantum contextuality has been associated with 
fast computing [?J and quantum information processing 
Q, which opens a path to possible application of contex¬ 
tual resources in different scenarios. 

A particular example of contextuality in the framework 
where two or more spatially separated parties perform 
measurements on each subsystems has been termed non¬ 
locality @. A lot of effort has been devoted to classifying 
and quantifying the nonlocality which was identified as a 
useful resource in the device independent quantum infor¬ 
mation processing (see also Sec. IV of Ref. Q. Although 
different in nature, nonlocal correlations as well as quan¬ 
tum entanglement proved useful in information process¬ 
ing tasks which cannot be performed with the sole use 
of classical correlations. This in turn led to the develop¬ 
ment of the resource theories of entanglement 0 and 
nonlocality [l 1. 

While approaching to the formulation of a resource 
theory three basic ingredients need to be considered. 
Firstly, the concept of a resource needs to be devel¬ 
oped, and showed that it is useful regarding some specific 
tasks, which remain unattainable while having only non¬ 
resource objects at disposal. Secondly, there must be 
operations by which one may transform resources into 


one another. Thirdly, one needs to have a tools to com¬ 
pare different objects by means of measuring the resource 
contained by them, namely a measure of the resource. 

Only recently a theory of resources has been formal¬ 
ized with respect to nonlocal resources 0, steering re¬ 
sources 0] , as well as a general abstract characterization 
of resource theories has been formulated [lj§], which cap¬ 
tures all needed features and relations that they have in 
common. In this light we develop the recent theory of 
nonlocal resources [llj to include the notion of contextu¬ 
ality the particular manifestation of which is nonlocality. 
After identifying the contextual system as useful regard¬ 
ing some computational tasks [0 , in this paper we treat 
contextual systems (“boxes”) as resources in similar way 
as it has been done with respect to nonlocal resources. 
In particular, we describe the notion of contextuality and 
then, based on the resource theory of entanglement, we 
formulate a set of axioms for the transformations of con¬ 
textual resources, as well as for measures intended to 
quantify the value of given resources. 

One of the axioms that we explore is asymptotic con¬ 
tinuity — a very desired property from the experimental 
point of view. Suppose we wish to quantify the amount 
of resource of a box B using the measure X. However, 
the experimental realization of the box produces an im¬ 
perfect box B' which is possibly close to B: 

\\B-B'\\<e, (1) 

where ||.|| denotes the trace distance between two boxes 
[0. For any measure of a resource we want it to dif¬ 
fer not more than the distance between two considered 
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boxes times a constant logarithmic factor of the dimen¬ 
sionality of the box. Asymptotic continuity of a measure 
X therefore means that 

\X(B)-X(B')\<elogd + f(e), (2) 

where d is dimensionality of the boxes and / is a function 
such that /(e) £ ~ > °> 0. 

Only recently the measures of contextuality has been 
developed, which enables to quantify the amount of con¬ 
textuality of boxes [16]. In particular two measures of 
contextuality has been introduced: mutual information 
of contextuality (MIC) and relative entropy of contextu¬ 
ality (REC). As stated earlier, from experimental point 
of view it is mandatory that any measure of a resource, 
in particular the measure of contextuality is asymptoti¬ 
cally continuous. With this respect the main result of the 
paper is proving that the measure MIC fulfills the axiom 
of asymptotic continuity. Since the measures MIC and 
REC are equivalent [l6[ , the asymptotic continuity holds 
also for the latter, relative entropy distance measure. 

The next result of the paper is showing that the rela¬ 
tive entropy measure of a resource is upper-bounded (up 
to a constant factor) by another measure which has been 
defined in Ref. [l6j, the cost of a resource. We then give 
examples of applications of this result for a class of bi¬ 
partite boxes with binary inputs and binary outputs (the 
most nonlocal of which is PR- box [lTj, [H}), as well as a 
class of boxes related to contextual n-cycles [h}. Fur¬ 
thermore, we consider distillation of a resource with the 
regard that the measure of a resource fulfills the axiom 
of monotonicity (i.e. that the measure does not increase 
under the set of allowed operations), and show that dis¬ 
tillable contextuality is upper-bounded (up to a constant 
factor) by the value of the measure. We also make use 
of a distillation protocol as originally devised in to 
show how contextual resources can be distilled. We then 
analyze the application of the bound with respect to two 
relative entropy-based measures of contextuality. 


II. AXIOMATIC APPROACH TO RESOURCE 
THEORY OF CONTEXTUALITY AND 
NONLOCALITY 

We present a framework for a construction of a gen¬ 
eral resource theory of contextuality, which can be re¬ 
garded as a development of resource theory of nonlocality 
as presented in Ref. [Tlj] . After formalizing the notion 
of contextual resources with the relation to nonlocal re¬ 
sources, which are to be understood as a specific form 
of the former, we proceed with formulating the axioms 
for operations on contextual resources, and axioms for 
the measures intended to quantify the value of a given 
resource. 


A. Contextual or nonlocal boxes 

In the present setting the object of interest are the 
measurement statistics, without any references to what 
actual measurements are being performed on actual phys¬ 
ical systems. At this point we do not need to assume 
that the measurements are performed by spatially sep¬ 
arated parties, which in fact the notion of nonlocality 
is all about. In the present view we consider a set of 
observables Ad that can be performed on any physical 
system, where a measurement of each Mi £ Ad gives an 
outcome mi with a probability p{rrii\Mi). Let us assume 
that there exist subsets AR of jointly measurable observ¬ 
ables. Each such subset we will call a context denoted by 
c. The joint probability distribution of obtaining the out¬ 
comes (mi, m 2 ,..., mfc) while measuring the observables 
Mi, M 2 ,..., Mfc belonging to a context c we will denote 

by 

p( A c ) :=p(m 1 ,TO 2 , ...,m fc |M 1 ,M 2 , ...,M fc ), (3) 

where A c = (mi, m 2 ,..., to*,) such that Mi,M 2 ,...,Mfc £ 
c. Note that an observable Mj may belong to several 
different contexts. A box ( B ) is then a set of joint prob¬ 
ability distributions B = {pb(A c )} for all contexts in Ad 
(we will omit the subscript B of ps(A c ) when it is not 
necessary). 

There is, however, a significant constraint which must 
be obeyed by all boxes to be physically realizable, namely 
the consistency condition, which states that any marginal 
distributions for observables that belong to different con¬ 
texts are independent of a chosen measuring context: 

5Z P p ( x ^- ( 4 ) 

\i\Xin\j Xj\XinXj 

The boxes that fulfill the consistency condition we call 
consistent boxes , and the set of all consistent boxes we 
denote as B. Throughout the paper while referring to 
the set of boxes we mean exclusively the set of consistent 
boxes B , without an explicit indication. 

At this stage it should be noticed that the term box 
is more general than the classical probabilistic model for 
the whole set of observables Ad, which is given by the 
joint probability distribution p(A|Ad), such that for all 
contexts 

v{K) = p( x \ M )- ( 5 ) 

A\A C 

A set of valuable resources from this perspective are those 
boxes, which cannot be modeled classically, i.e., they can¬ 
not be described by a single joint probability distribution 
for all observables. The set of boxes that constitutes a set 
of resources we denote as B v . while its elements, i.e., par¬ 
ticular valuable boxes, we will denote as B v . Otherwise, 
the set of classical boxes, which are useless from the point 
of view of valuable resources, we denote as B nv , whereas 
its elements by B nv . Furthermore, it is known that each 
box B nv £ B nv can be decomposed into a mixture of de¬ 
terministic boxes which constitute the extremal points of 
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the convex polytope that is identified with B nv . The ex¬ 
tremal points of the set B nv we will denote as E nv , and 
each box from B nv can be written as B nv = J^iPi^nv 
for a proper distribution {pi}. Notice that in general 
the extremal boxes E nv are not all the extremal vertices 
of the set of consistent boxes B. The extremal boxes of 
B which do not belong to the set B nv , and constitutes 
valuable resources, we will denote as E v . 

Let us now formalize the resource theory approach re¬ 
garding to the notion of contextuality. The crucial as¬ 
pect of contextual systems is that one cannot ascribe 
the values for each observable into the physical system 
prior measurements such that they will obey the ob¬ 
served statistics. Otherwise, i.e., if one could in prin¬ 
ciple attribute the observed values of all measurements 
it would mean that the construction of a probabilistic 
model p(\\M ) is possible. Therefore, if we regard a set 
of contextual boxes Be as a set of resources, then the 
set of noncontextual boxes Bnc composes of valueless 
objects, B nv = B N c- 

Suppose now that we have instances of physical sys¬ 
tems composed of two (or in general more than two) 
subsystems such that each subsystem is being measured 
by two spatially separated parties, Alice and Bob. Due 
to space-like separation, each pair of measurements per¬ 
formed by the two parties x € M-AtU 6 Ads com¬ 
mute, hence gives naturally arising contexts c = {x,y). 
From this perspective, the boxes are given by families 
of probability distributions p(X c ) = p(a,b\x,y ), where 
X c = (a, b) is a pair of measurement outcomes of mea¬ 
surements ( x,y ). The comprehensive resource theory of 
nonlocality was presented in Ref. El , where nonlocal 
boxes were identified as valuable resources, whereas local 
behaviors were considered as the boxes from the set B nv 
(see also Ref. El)- 

With respect to nonlocality, the consistency conditions 
Q are the strict analogues of nonsignaling conditions: 
the marginal distributions for one party are indepen¬ 
dent of the measurement choice made by spatially sepa¬ 
rated other party. Hence the consistency conditions as¬ 
sures nonsignalling in the case when measurement con¬ 
text arises from space-like separation of different parties 
and in this case the set of consistent boxes is to be un¬ 
derstood simply as a set of nonsignaling boxes. 

B. Axioms of box-resource theories 

Once we wish to use the boxes for some specific tasks, 
we need to specify what are the permitted transforma¬ 
tions by which we can process the boxes. Regarding the 
different sets of boxes, which either constitute valuable 
resources ( B v ), or not ( B nv ), we formulate two axioms 
that the operations must obey: 

(01) for general operations E, given by normalization¬ 
preserving and consistency-preserving transformations E 
we must have 

( 6 ) 


Notice that general transformations E transform boxes 
from one set into another without the assumption of the 
preservation of the dimensionality of the boxes. In partic¬ 
ular some operations E may act as changing the number 
of contexts of a box by adding or removing some observ¬ 
ables from the set A4. In case when the dimensionality of 
boxes (the number of inputs with the respective number 
of outputs) is preserved, then the general operations can 
be described by a matrix form [221 ] (provided that the 
consistency of a transformed box is preserved): 



( OLx\E\\ 


ai|c|Ti|c| 

E = 





V tt|c|l7|c|l 


“|c||c|7|c,|c| 


where Ej are stochastic matrices acting on the vector of a 
probability distribution p(Xj) from a box B 1 0 < ctij < 1, 
and otij = 1. 

(02) for free operations C, which constitutes a subset 
of general operations T, we must have 

3 B2gBn<) C{Bi) = B 2 - (8) 

Similarly as a set of LOCC (local operations and classi¬ 
cal communication) in the context of a resource theory 
of entanglement [l(j, and a set of WCCPI (wirings and 
classical communication prior to inputs) in a resource 
theory of nonlocality [jj], 123 , the set of free operations 
C is composed of those “given for free” operations in 
a device independent information processing, which by 
themselves cannot produce a resource from the set B nv . 

An important class of general operations are reversible 
operations 1Z, for which 

R~\R(B)) = B. (9) 

A particular example of operations from the set 1Z are 
relabelings (see [ll| for details). 

While considering boxes as resources we arrive at the 
question whether using general operations we can obtain 
resource that differs quantitatively from the original box. 
Similarly as in the entanglement theory we need to spec¬ 
ify what are the relations between different resources, i.e., 
whether we can state quantitatively that a given resource 
is more valuable than the other. It is therefore desirable 
to have a measure to quantify the value of the given re¬ 
source. Below we formulate the axioms that a reliable 
and usable measure for a given box, call it X(B ), need 
to fulfill. 

First of all we need to know which boxes do not con¬ 
stitute a valuable resource. Thus the basic requirement 
for the measure X is 

(Ml) faithfulness, which indicates 

V SeBn „ X(B) = 0. (10) 

Note that we restricted the axiom of faithfulness solely to 
the condition given above, and do not set an additional 
requirement X(B V ) > 0. After Ref. El we recall that 
some measure defined by usefulness with respect to a 


VsjeBi ^B 2 eB 2 E(Bi) = B 2 - 
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given operational task may give X(B) = 0 even for boxes 
which do not belong to the set B nv . 

Another property that the measure need to have must 
reflect the fact that single-box general operations cannot 
increase the value of a given resource: 

(M2) monotonicity , which states that 

V BeB X(T(B)) < X(B). (11) 

As we have seen earlier, there is a set of reversible 
operations 1Z such that each operation R from this set 
has its inverse P _1 for which there holds R~ 1 (R(B)) = 
B. Since we already stated that a measure X should be 
monotonic under general operations T, the measure X 
needs to fulfil 

(M3) partial invariance , which means that the measure 
X is invariant with respect to reversible operations 1Z 
performed on a box 

Vilen X(R(B))=X(B). (12) 

For the purpose of defining the next axiom we need to 
specify the distance measure for a pair of boxes. 

Definition 1 The trace distance between two boxes B 
and B' is given by fldi l: 

\\B — B'\\ :=sup||5(B)-5(B')|| D , (13) 

s 

where ||.||d denotes the norm of difference between two 
probability distributions which is given by variational dis¬ 
tance between them 

lbs(A c ) - Pb'{K)\\d ■= ^2 \pb{A c ) ~ PB’{ A c )|, (14) 

Ac 

where pb( A c ) (Pb'(A c )) Is the distribution of a measured 
context c of a box B (B 1 ) for the outputs A c . The supre- 
mum in da is taken over all operations which transform 
boxes into a probability distribution according to 

S(B):=J2^cT cP b{Xc), (15) 

C 

where T c are stochastic matrices, 0 < a c < 1, and 
E c a c = l- 

To this end we notice that any experimental realiza¬ 
tion of boxes involves inevitable distortion from the ideal 
box we want to realize. Suppose that an experimentally 
realized imperfect box B' is close to the target box B , 
which constitutes a valuable resource: 

\\B-B’\\<e, (16) 

where ||.|| denotes the trace distance between two boxes 
[l5j . Now, the desired property of the measure that con¬ 
forms the requirements raised by the non-ideal experi¬ 
mental processes is: 

(M4) asymptotic continuity , which means that for two 
close boxes m, there holds 

\X(B)-X(B')\<elogd + f(e), (17) 


where d is dimensionality of the boxes and / is a function 
such that /(e) 0. 

Apart from the above axioms we recall also two wel¬ 
come properties that the quantifiers of a resource should 
satisfy: 

(PI) convexity , which means that for any box B which 
is decomposable into B = J2 i p i B i , there is 

X(Y,PiBi) < Y,Pi X (Bi), (18) 

i i 

where Bi are arbitrary boxes. 

If we consider a measure to be extensive we would also 
require the property termed by 

(P2) additivity , for which a measure fulfills 

Vs lG B 1 ,B a eB 2 *(Bi ® B 2 ) = X(B x ) + X{B 2 ), (19) 

where <g) denotes tensor product of boxes, i.e., for boxes 
B\ = {p(A c )| and B 2 = {g(A c /)} we have Bi <g» B 2 = 
MAcMAc')}- 

III. MEASURES OF CONTEXTUALITY 

In this section we focus on contextuality measures as 
defined in Ref. [l(| . We then introduce the tools needed 
for proving the asymptotic continuity of the measures. 
The crucial assumption for the latter is closeness of two 
respective boxes. As we shall see the assumption is still 
valid while considering two close quantum states from 
which the respective boxes are drawn. 

Given a single joint probability distribution of an ar¬ 
bitrary context c of a box B we can trivially define an 
extended joint probability distribution for all observables 
in A4: 

A B (c):=p(A c )P(A r \c), (20) 

where P(A r |c) is a joint probability distribution for all 
the observables from the set A i\c, and A r is the corre¬ 
sponding set of measurement outcomes. 

Definition 2 For a given box B = {ps(A c )} we call its 
extension a family of distributions: 

F(B) := {A b (c)}, (21) 

where Ab(c) := ps(A c )P(A r |c) is an extension of distri¬ 
bution pb(A c ) to all observables of a box B. 

For further purposes we will write £’ p ( c )(P) to denote 
a distribution, related to the extension P(B), which for 
the ease of notation we model as a quantum state: 

£p(c)( B ) ■= ^2 p(c)\c)(c\ ®A b {c), (22) 

C 

for an arbitrary probability distribution p(c) of contexts. 

We will now recall the definitions of two measures of 
contextuality as presented in |l6l | . The first measure— 
mutual information of contextuality—captures the idea 
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that a contextual system cannot be described by a sin¬ 
gle joint probability distribution for all observables that 
can be measured on the system. It quantifies the cor¬ 
relations between the different joint probability distribu¬ 
tions consistent with each of the measured context and 
the number of a chosen context. The second measure— 
relative entropy of contextuality—is defined in terms of 
a statistical distance between a set of probability dis¬ 
tributions describing a contextual system to the closest 
single noncontextual joint probability distribution. The 
relative entropy of contextuality is a natural extension to 
contextual systems an analogous measure of nonlocality 
presented in [24j , called statistical strength of nonlocality 
proofs. 

Definition 3 Mutual information of contextuality of a 
given box B we call 

-fmax(-B) := sup inf /(V'p(c)|c)(c| ® M s (c))(23) 

{p(c)}Mb(c)} ^ 

where I is mutual information between probability distri¬ 
butions. 

We also have 

Definition 4 Relative entropy of contextuality of a given 
box B we call 

X max (B) := sup inf S^p(c)D (ps(A c )||p(A c ))(24) 

{p(c)} MUI V 

where D is relative entropy, and infimum is taken over 
all joint probability distributions for all observables in A 4 , 
p( A), such that p(X c ) is a proper marginal distribution for 
a given context: 

p(X c ) = E p W- ( 25 ) 

A4\c 

Note that given a box B , we have: 

/ max (B) = X max (B), (26) 

hence we can use I ma . x (B) and X max (B) interchangeably. 
This equality will prove useful, as the arguments given 
in (25j that REC is asymptotic continuous are not suffi¬ 
cient for a proof of the latter fact. Instead will prove the 
asymptotic continuity of MIC. 

A specific measure of contextuality also utilizing the 
concept of relative entropy is given in the following. 

Definition 5 Uniform relative entropy of contextuality 
of a given box B we call 

X U (B) := { mf } £ X -D (p B (X c )\\p(X c )), (27) 

where n is a number of different contexts of a box B. 

We now observe the following fact, namely when the 
trace distance of two boxes is small then for any extension 
of one box there exists an extension of the other, which 
is close to the former. 


Observation 1 If two boxes B = {pb(X c )} and B' = 
{pb'{X c )} satisfy \\B — B'\\ < S, then for S > 0 and for 
any fixed {p(c)} we have 

v f : p( c)(B) 3 £p(c) ( B ') \\£p(c){B)-£p( c ){B')\\ D < 5. (28) 


Proof. The left-hand side of ineq. (l28l) equals: 


Ep( c )I|AIb(c) - Ab'(c)||.d, 

c 

(29) 

where 

A b (c) := p B (X c )P(X r \c), 

(30) 

Ab'{c) := p B '(Xc)P(X r c). 

(31) 

Consider the difference between the two 

distributions 


Ab(c) and Ab'(c): 

|| Ab{c) - Ab'(c)||.d 

= E \p B (X c )P(X r \c) - pb'(X c )P(K\c)\ 

\ c ,\ r 

= E(MA C ) — PB'(A c )|(5>(A r |c))) 

Ac A r 

= E \pb(X c ) -PB'{ Ac) | 

Ac 

= ||5*(B)-5*(B / )llu 

<anp\\S{B)-S(B , )\\ D 

s 

< S, (32) 

where the first inequality comes from the fact that 
P(A r |c) is a probability distribution for each c, S* is such 
that a c = 1 and % is the identity matrix for a chosen 
context c, while the last inequality follows from the as¬ 
sumption of the observation. Using the last inequality 
back in (EU1) we obtain (E51) . which ends the proof.| 

We need to introduce the notation for the correspond¬ 
ing extensions given by eq. as it is described below. 

Definition 6 Consider two boxes B = {ps(A c )} and 
B' = {pb'{X c )}. For any extension iF(B) the exten¬ 
sion J-(B') given by Eq. (1311) . which satisfy ||£ p ( c )(.B) — 
£ p ( c )(.B')|| < S, we will denote as T(B'\B). 

In the next section we will derive asymptotic continu¬ 
ity of mutual information of contextuality J m ax- As we 
shall see in the following, the closeness of quantum states, 
p and p' , implies the closeness of the respective boxes, B 
and B' , while assuming perfect measurements on quan¬ 
tum states. Consider two boxes, B and B' , each of them 
drawn by the same set of measurements M c , but on a 
different quantum state p and p' , respectively, such that 
||p — p'|| < 5 , where in case of matrices ||A|| := V AAf 
Then / max can be proved asymptotic continuous with re¬ 
spect to trace distance for quantum states ||p — p'\\ < 5. 

Observation 2 Consider any two states p , p' such that 
Up — p'11 < S. Let O = {M c } be a set of operators, which 
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generates two respective boxes B and B' on states p and 
p', respectively, in such a way that each distribution of 
the box B (B') is given byps(Xc ) = {TrM c p} (pb'[\ c ) = 
{TrM c p'}). Then we have: 

\\B-B'\\< 26. (33) 

Proof. Consider a distribution a* and stochastic matri¬ 
ces T* that realizes supremum in definition of | \B — B'\ |. 
Let M* be a measurement operator defined as 

M* := ^ a*T c *M c . (34) 

C 

From the assumption we have that 

6 > \\p-p'\\ 

= sup TrM (p — p') 

M 

>Tr P + M*{p-p'), (35) 

where P + is a projector onto the positive subspace of 
M*(p — p'). The RHS of the last inequality above is 
equal to 


s+ = ^2( r P - v)> ( 36 ) 

{+> 

where r p = TrM*p (r p > = TrM*p') and the sum is over 
all terms r p — r p i >0. Changing the roles of p and p' we 
obtain analogously: 

8>\\P'~P\\ 

= sup Tr M(p' — p) 

M 

> J2( r p‘ - r >) 

{+} 

= - v)l 

{-> 

= S~. (37) 

From the above inequality, and (15S1) . we have that 
max(5 + ,S ,_ ) < S, and since the distributions r p and r p > 
are obtained with the optimal a* and T*, we obtain 


\\B-B'\\ = \\r p -r pl \\ D = 
which ends the proof.| 


S++S~ < 2 ma x(S+,S~) < 26, 

(38) 


IV. ASYMPTOTIC CONTINUITY OF J max 

In this section we will prove the asymptotic continu¬ 
ity for the measure of contextuality given by mutual in¬ 
formation of contextuality (l23l) . We will focus on MIC 
rather than relative entropy of contextuality, and we will 
introduce a novel method of proving the asymptotic con¬ 
tinuity. Nevertheless, the asymptotic continuity of the 


former (MIC) implies the same for the latter (REC) since 
the two measures are equal to one another [16l |. 

At this stage let us also recall the property of asymp¬ 
totic continuity for von Neumann entropy. Consider two 
quantum states p\ and p 2 of dimension d, for which 

||pi-p 2 || < 1/2, (39) 

where 11.11 is trace norm. Von Neumann entropy is asymp¬ 
totically continuous since 

^(pi) - 5 , (p 2 )| < 11Pi — p 2 11 logd + p(||pi — p 2 ||), (40) 
where p{x) = — xloga;. 

Furthermore, quantum conditional entropy is also 
asymptotically continuous [26], i.e., for ||pi — p 211 < e 
we have 

I'S'xiy(pi) - S X \ v(p 2 )l < (41) 

4||pi - p 2 || logd-t- 2p(||pi - p 2 ||) + 277(1 - ||pi - p 2 ||), 

where X, Y denotes two subsystems of the states pi and 
P2- 

Now, Observation [T] allows us to write the following. 

Observation 3 If for some 6 > 0 two boxes B = 
{pb{ A c )} and B' = {ps'(Ai)} satisfy 

\\B-B'\\<6, (42) 

then for any fixed (p(c)} we have 

Vf p(c ,(s) ^£ pM (b'\b) \I(£ p (c)(B))-I(£ p ( c )(B'\B))\ < g(6), 

(43) 

with 

g(5) = 5dlogd+ 2p(l - 6) + 3rj(6), (44) 

where I is mutual information, d = min(]^[ i _ 1 a*, \Eg\), 

where JliLi a * an d \Bg\ ire the dimensions of the two 
subsystems of the box B, respectively. 

Proof. Due to Observation Q] for every £ p ( c \ {B) there 
exists fp( c )(i? , | B) such that ||£ p ( c )(i?)— £ p ^ c )( y B'\B)\\ < 5. 
Using the definition of mutual information, the left-hand 
side of fl51) can be written as 

\I{£ P[C) {B)) - I{£ p[c) {B'\B))\ 

= \S x (£ pic) (B))-S xlY (£ p{c) (B)) + 
-S X (£ P{C) (B'\B)) + Sxiy(£ p ( c )(B'\B))\. (45) 

Then, making use of a triangle inequality we can bound 
this expression by 

\S x (£ p[c) m - S x (£ p{c) (B'\B))\+ 

+ \Sx\y{£ p (c)(B)) - S x \y(£ p (c)(B'\B))\ 

= 5(5 log d+ 2p(l — 6) + 3r/(5), (46) 

where in the last step we used asymptotic continuity of 
von Neumann entropy (1401) as well as quantum condi¬ 
tional entropy (TEil) . | 

Consider a following lemma: 
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Lemma 1 For any real valued function f, and for any 
two sets T and T', if there exists a real valued function 
g, such that for any positive S and any p £ T there exists 
Up £ T’, such that: 

\f{p) - f(°p)\ < 9(S), (47) 

and for any a € T' there exists p a £ T, such that: 

\f(Pa) - f(cr)\ < g(6), (48) 

then there holds: 

I inf f(p) - inf f(a)\ < g{6) + 6, (49) 

pGT cr(zT' 

and 

I sup f{p) - sup f(a )| < g(S) + 6, (50) 

pGT a GT' 

provided that inf p ^t f(p) and inf pe r'/(c)> as well as 
sup eT f(p) and sup pgT / f(a) are bounded. 

We prove the lemma in Appendix [A] We can now state 
the following theorem: 

Theorem 1 For any fixed p(c) the function 

4( C )( B ) = . inf /(5>(c)|c)<c| ® A b (c)) (51) 

{-Ab (c)} “ 

is asymptotically continuous, i.e., for any \\B — B'\\ < 5 
there is 

\Ip(c){B) — I p ( c )(B')\ < g(S) + 6, (52) 

with g(6) given in the right-hand side of M- 

Proof. Assume that \\B — B’\\ < 5 for two boxes 
B = (gs(c)} and B' = {gs'(c)}. Let us first consider 
Ip( c ) ( B ). By definition of infimum, there exist a sequence 
of extensions of distributions F n (B) = {A^(c)}, such 
that 

lim I(£ n (B)) = I P(C) (B), (53) 

and similarly there exist F n (B') = {A^,(c)}, such that 
lim I(£; ic) (B')) = I p{c) (B f ). (54) 

n—too ’ 

Let us now specify the following two sets: 

OO 

T:= (J {I(£p^(B)),I(£p^(B\B'))}, (55) 

n= 1 
oo 

T':= \J{I(£; {C) (B'\B)),I(£^ C) (B'))}, (56) 

n= 1 

and also denote V {V') as the set of /(£ p ( c \ (B)) 
(/(£p( c )(i? , ))) for all distributions Ab(c) ( Ab'{c )). No¬ 
tice that [I(£ r A(B))} C T, and therefore we have that 

inf T < mf{I(£; {c] (B))} = I p{c) (B). (57) 


On the other hand, T C V, and therefore 

I p(c) {B) = inf y< inf T, (58) 

where the equality is by definition of / p ( c )(i?). The in¬ 
equalities (1571) and (1551) leads to 

infT = J p(c) (B), (59) 

and the same reasoning also gives 

inf T'= I p[c) {B'). (60) 

Now, due to Observation [3l there is 

\I{£; {c) m - I{£; {C] {B'\B))\ < g(6), (61) 

I I(£p (c) (B')) - I(t% c) (B\B'))\ < g(S). (62) 


Let us take in the assumption of Lemma Q] the function 
/ : R —^ R to be simply the identity function f(x) = x 
(it is easy to see that other assumptions are satisfied by 
the construction and the fact that the boxes B and B' are 
close to each other and by inequalities (|6l|) - ((62|) b Then, 
by Lemma [1] we obtain 

I inf t — inf t'\ < g(S) + 5, (63) 

ter t'GT 1 

which is exactly what we want to prove.| 

We can now state the main theorem of this section. 
The result is crucial from the experimental point of view. 
Namely it states that for two close boxes (one of which 
is the ideal box we aim to obtain, while the second is its 
imperfect experimental realization) the measure of the 
amount of contextuality of the experimentally obtained 
box cannot differ from the measure of contextuality of the 
target box by more than the distance of the two boxes 
with a factor depending only logarithmically on the di¬ 
mension of them. 

Theorem 2 The measure of contextuality / max is 
asymptotically continuous, i.e., for any two boxes B and 
B’ fulfilling \\B — B'\\ < 5, there is 

|/max(R) - 4max(B , )| < 6 g(S) + 36, (64) 

with 

g(6) = 56 log d+ 2r)(l — S) + 3ry(d). (65) 

Proof. The proof follows the similar lines as the proof 
of Theorem [TJ Let us first consider I max (B). By defini¬ 
tion of supremum, there exist a sequence of distributions 
(p n (c)}, such that 

lim 4n(c)( B ) = -fmax(-B), (66) 

n—t oo v ' 

and similarly there exist {p' n (c)}, such that 

lim I , ( C )(R) = /max(B')- 


(67) 


Let us now specify the following two sets: 


OO 


U { I p n (c)(B),Ip' n (c)(B)}, 

n =1 

(68) 

oo 

U {I Pn{c) (B'),I p , n{c] (B')}, 

n =1 

(69) 


and also denote V (V') as the set of I Pn ( c ){B) ( Ip’ (c)(B ')) 
for all distributions {p n (c)} ({p' n (c)}). Applying the sim¬ 
ilar reasoning as in the proof of Theorem [l] but this time 


concerning the suprema, we arrive at 

supT = I max (-B), (70) 

and the same reasoning gives also 

sup T' = I max (B'). (71) 

Now, due to Theorem [U there is 

\I Pn(c) (B) - I Pn{c) (B')\ <g(S) + S, (72) 

\I Kic) (B')-I p , n{c) (B)\<g(6) + 5. (73) 


Let us take in the assumption of Lemma [T] the function 
/ : R — > R to be simply the identity function f(x) = x 
(it is easy to see that other assumptions are satisfied by 
the construction and the fact that the boxes B and B' are 
close to each other and by inequalities (TLZ1) - (1731 ) ). Then, 
by Lemma [l] we obtain 

| supf — sup t'\ < g(8) + 28, (74) 

teT t'&T' 

which is exactly what we want to prove.| 

From the two main theorems presented here, we have 
immediate corollary, which states, that the relative en¬ 
tropy of contextuality is continuous with respect to quan¬ 
tum states, provided ideal measurements. 

Corollary 1 For two quantum states p and a such that 
Up — <r|| < 8, for any set of quantum measurements M = 
{M c } which generates on the states respective boxes B = 
{TrM c p} and B' = {TrM c a}, we have: 

14(c) ( B) - 4(c) (S’) I < 2 g(8) + 28, (75) 

as well as 

IWB) - W-B')l < 12 9 ( 8 ) + 68 , (76) 

with g(S) = 5<5 log d + 2g(l — 8) + 3g(8). 

Proof. This corollary follows directly from observation 
[2j and theorems [T] and [2] 


V. COMPUTABLE UPPER BOUND ON THE 
MEASURE OF A RESOURCE 

In this section we will connect two measures of box 
resources: the one which is based on relative entropy, 
with the other which reports how much costs the creation 
of a box. In case of the resource which is contextuality, 


we will show, that the relative entropy of contextuality 
for chain boxes is upper bounded (up to a normalization 
factor) by the cost of contextuality. In derivation of this 
result we will use the property of (PI) convexity of the 
relative entropy of contextuality, which was observed in 
[27j . but without a formal proof, which we provide in 
Appendix [B] 

To achieve this, we will show a general result, which 
holds for a measure satisfying certain axioms. First, we 
assume that sets B and B nv are convex polytopes and 
B nv C B. We will now need the notion of a convex roof 
of a measure which for the measure X is defined as 

X(B) := inf V Pl X(E i ), (77) 

{P^i 

where Ei are extremal boxes of the polytope of all con¬ 
sistent boxes, and infimum is taken over all ensembles of 
the box B with all extremal boxes, so that JApi-E) = B. 
We begin with noticing the following observation: 

Observation 4 For any convex measure X and B £ B 
we have 

X{B) < X(B). (78) 

Proof. It follows from the definition of A.| 

Next we note that some of the extremal boxes of the 
polytope of consistent boxes are valuable (El), while the 
others are not (E l nv ). Such a polytope B in some cases 
can satisfy the following property, which we call vertex- 
equivalence property: 

Ve v ,e' v gb R(E V ) = E' v , (79) 

where E V ,E[, are extremal, valuable boxes. In other 
words, any valuable box can be transformed into any 
other valuable box by means of reversible operations. 

We will now show that if the polytope B has the vertex- 
equivalence property, a convex measure X satisfying ax¬ 
ioms (Ml) and (M3), is upper-bounded (up to a nor¬ 
malization factor X(E V ) for some extremal valuable box 
E v ) by another measure which we will call the cost of 
the resource. We first formalize the latter measure, as a 
generalization of the well known measure of the cost of 
non-locality: 

Definition 7 For a box B £ B the cost of the resource 
for this box is 

C(B) = inf{p : B = pB v +(l-p)B nv , B v e B, B nv e B nv }. 

p 

(80) 

We are ready to present the main result of this section: 

Proposition 1 Let B be a polytope of consistent boxes 
satisfying the vertex equivalence property. Also let X be 
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a measure of resource acting on boxes B € B, which sat¬ 
isfies the axioms of (Ml) faithfulness, (M3) local invari¬ 
ance and (PI) convexity. We have then 

X(B) < C(B)X(E V ), (81) 

where E v is arbitrarily fixed, extremal, valuable box in B, 
and C(.) is the cost of the resource. 

Proof. Let us fix B £ B arbitrarily. Since X is convex, 
by Observation |U we have: 

X{B)< inf (82) 

{pij ^ 

Now, for all E t £ B nv we have by axiom (Ml) X(Ei) = 0. 
Thus we have 

X(B)< inf Y^piXiEi), (83) 

where I is the set of indices for an ensemble {pi , Ei } 
such that if i £ I then Ei is valuable box. On the other 
hand, by the vertex equivalence property of the polytope 
B there is X(Ei) = X{E V ) for all i £ I and E v being 
some extremal valuable box fixed arbitrarily. Indeed, all 
the valuable boxes have the same value of the measure X, 
since they are reversibly transformable one into another, 
and the measure X satisfies the axiom (M3), i.e., does 
not change under such transformations. Hence 

X(B) < X(E V ) inf y Pi , (84) 

{ pi,Ei l tex 

and furthermore 

X{B) = X{E V ) inf y Pl . (85) 

We will now prove that the minimal Pi equals ex¬ 

actly C(B), which makes the thesis. Consider again any 
pure ensemble into boxes Ei with probabilities pi. We 
can perform a valid decomposition of a box B into: 

B = pB v + (1 - p)B nv , ( 86 ) 

where p = J2iexPi an d, by definition of I: B v = 
^ielPiEi/(52i£lPi)i an d B nv = 'Ylii0:P^il’ (^2i&xPi)- 
Thus, there is C(B) < p, and since the ensemble was 
arbitrary, we have that 

C(B)< inf y Pi . (87) 

tex 

This, via (1551) . implies that 

C(B)X(B V ) < X(B). (88) 

We now prove converse inequality. To begin with, con¬ 
sider any decomposition which achieves C(B ) (the case 
when infimum in its definition is not attained, is then 
obvious): 

(89) 


Now, consider a decomposition of a box B v into ex¬ 
tremal boxes. Since q = C(B), there cannot be any 
deterministic box in the decomposition of B v , or else we 
would have smaller value of C(B) by subtracting this 
box from B v and thereby lowering q. Thus, we can write 
B v = Y nEj, with Ej - extremal valuable boxes. Let 
us also decompose a box B nv into extremal non-valuable 
boxes (it is possible, since this box is by definition not 
valuable, and the set B nv is a convex polytope by assump¬ 
tion): B nv = s kDk • We then have an ensemble of 

the box B of the form y = {(qr i,..., qr n, (l — q)si ,..., (1 — 
q)sK), Ei,.., En, Di,.., Dk)} for some natural numbers 
N,K > 1, i.e., by construction: 

N K 

B = < iY 1 r i E i + (1 - q) y Dk- (90) 

1=1 k —1 

Now, we have X(Dk) = 0 for k £ {1,...,A'}. Consider 
then the set of indices I = {l,...,iV}. Since X is defined 
as a function which is minimized over all ensembles of the 
box B, it is upper bounded by the value of the function 
on the ensemble y. Thus we have: 

X(B) = inf y piX(Bi) 

{Pi,B t 

< yqr z X(Ej) 
iex 

= X(B v )yqrj 

= X(B v )q, (91) 

where the pre-last equality is due to the fact that for 
all* £ I X(Ei) = X(B V ), since the polytope of consis¬ 
tent boxes satisfies the vertex equivalence property. This 
proves that 

X(B) < C(B)X{B V ), (92) 

which together with the opposite inequality (1551) proves 
the thesis. | 

Let us note here that in case when B nv is a poly¬ 
tope, the cost of the resource can be computed by lin¬ 
ear programming, hence the above proposition provides 
a computable upper bound on a convex measure satisfy¬ 
ing (Ml) and (M3). 

In next section, we present two examples of polytopes, 
which satisfy the vertex-equivalence property, for which 
the above proposition applies. We present now more 
rough bound, which holds for all polytopes, including 
those which do not satisfy the latter property. 

Proposition 2 Let B be a polytope of consistent boxes. 
Also let X be a measure of a resource acting on boxes 
B £ B, which satisfies axioms of (Ml) faithfulness and 
(PI) convexity. We then have 

X(B) < C(B) max X(E V ), 

E v eB 


B = qB v + (1 - q)B nv . 


(93) 
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where maximum is taken over all extremal, valuable boxes 
E v from B, and C(.) is the cost of the resource. 

Proof. The proof goes along similar lines as the second 
part of the proof of Proposition [2J namely starting right 
after the inequality (1551) . We then repeat the proof until 
the sequence of (in)equalities (15T1) and modify them while 
observing that for s = Yliei ( ? r * 

qriX(Ei) < maxX(Ei) x s < max X(E V ) x s, (94) 
ie x 

where in the first inequality maximum is taken over ex¬ 
tremal valuable boxes Ei from the particular ensemble of 
the box B , while in the second over all extremal valuable 
boxes from the polytope B. According to the remaining 
part of the proof of Proposition [SJ we obtain: 

X{B) < C(B) x rna xX(E v ), (95) 

which is less tight version of the inequality (1551) as 
required.| 


We now generalize the above result to the case of re¬ 
sources of contextuality. We consider a class of boxes 
corresponding to n-cycle hypergraph (a class of CH 
boxes) scenarios as presented in Ref. ' 0 , h m m. 
Any box B G CH^ is given by B = {ps(A c )}, where a 
probability distribution for each context c can be written 
as 

Pb( A Ci ) = p{mim i+1 \MiM i+1 ) 

= + (mmi+i)), (98) 

with mj = ±1, where we use the convention m n m n+ 1 = 

TOiTO„. 

The corresponding polytope of boxes compatible with 
this hypergraph will be called B n . Let us note here, that 
B 4 = B{ 2 x 2). It is known 0 that the contextuality 
measure of any extremal valuable box E v for arbitrary n 
is given by 

Tl 

A ma X (E V ) = log -- . (99) 

n — 1 


A. Examples of the upper bounds on the A m ax via 
contextuality (non-locality) cost 

We can now apply the Proposition!]] in two cases. The 
first is the case of the set B equal to the set all bipartite 
non-signaling boxes with two binary inputs and two bi¬ 
nary outputs (we will denote it B( 2 x 2)). In this case, 
it is known that the only extremal valuable boxes of B 
have a form: 

D , ,, x f 1/2 if a ® b = xy © rx © sy © t 

B rst (a,b\x,y)= | 0 ' elge 

(96) 

where a, b , x, y, r, s, t are binary. 

It is then clear from the above form that all extremal 
valuable boxes can be transformed reversibly into the 
PR- box. Moreover the set of all non valuable boxes, 
is called the set of local boxes, and it is a convex poly¬ 
tope B nv (2 x 2) C B{2 x 2). We have then the following 
corollary: 

Corollary 2 For any box B 6 6(2 x 2) there holds: 

A ma xCB) < inf YPiX^Bi) = C(B)x log£, (97) 

where Bi are extremal boxes and C(B) is the cost of non¬ 
locality. 

Proof. It is straightforward to check that X max satis¬ 
fies the axioms (Ml) and (M3), and, as it was mentioned 
in [ 2 ]} and is proved in Appendix m Afmax is convex. 
Moreover, as we have mentioned before, all the non-local 
boxes with two binary inputs and outputs can be trans¬ 
formed by local reversible operations into the PR- box, 
i.e., Bqoo, for which X max (B 0 oo) = log § 0. Hence, by 
Proposition [T] we have the thesis.| 


Corollary 3 For a box B G B n we have 

/y TJ 

A max (R) < A max (R) = C(B) log--. (100) 

n — 1 

Proof. According to the description of a box B G 
CHW given in eq. (1551) . it can be uniquely described 
by a collection of n correlators 

B = ((mim 2 ), (m 2 m 3 ),..., (m n mi)). (101) 

Now, for any given n we have 2 n ~ l extremal contextual 
boxes of the form (I101D . where = ±1, such 

that the number of negative components is odd [L9|. It 
now suffices to observe, that we can obtain all the others 
extremal contextual boxes from, e.g., (—1,1,1,1,..., 1,1) 
simply by bit-flipping the chosen outputs, which is a con¬ 
textuality preserving operation (a particular form of re¬ 
labeling the outputs). Indeed, by performing a bit-flip 
mj —> —mj we change the sign of any pair of neighbor¬ 
ing correlators. For instance, performing a bit-flip m\ —> 
—mi on the box (—1,1, 1 , 1 ,..., 1,1) produces another ex¬ 
tremal contextual box (1,-1,1,1,..., 1,1). Consequently, 
performing a bit-flip on each consecutive mj , we can gen¬ 
erate all extremal contextual boxes with exactly one cor¬ 
relator equal to —1. Now, given a box (1,1,1,1,..., 1,-1) 
we again perform a bit-flip m± —> —m\ which produces 
a box (—1, —1,1,1,..., 1, —1) from which we can generate 
all the boxes with 3 correlators equal to —1. In so do¬ 
ing we can generate all 2 n ~ 1 extremal contextual boxes 
(all those with odd number of correlators equal to —1) 
by a contextuality preserving operation. We see then 
that B n satisfies vertex-equivalence property. Moreover, 
Amax does not change under bit-flip of outputs of some 
of the observables, hence assumptions of Proposition [2] 
are satisfied. In consequence the bound (11001) is true. | 
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VI. A BOUND ON DISTILLABLE RESOURCE 
IN BOX THEORIES 

In this section we consider a scenario analogous to the 
scenario of distillation of entanglement in the entangle¬ 
ment theory. Namely, we assume that n copies of a box 
B are provided to a party (or parties in case of non¬ 
locality). We distinguish a target box B J which is valu¬ 
able, approximation of which one wants to “distill” out of 
n copies of B. We demand that the distilling operations 
satisfy the axiom (02), i.e., do not create a valuable box 
out of non-valuable ones. We define distillable resource 
D(B^\B) for a box B , in analo gou s manner to defini¬ 
tion of distillable entanglement |30l l3l| as the highest 
ratio of number k which is such that the output of distil¬ 
lation protocol approximates [B((]® k , to the number of 
used boxes B which is n, in asymptotic limit of large n. 
The main result of this section states, that the so called 
regularized measure of resource A, for which X satisfies 
the axioms (M2) of monotonicity and (M4) asymptotic 
continuity, up to a constant factor X(B^) is an upper 
bound to distillable resource D(B^\B): 

X^{B)>X{Bl)D{Bl\B) 1 ( 102 ) 

where A°° = limn^oo —- is the regularized measure 

X. 


A. Proof of the upper bound 

In this section we prove inequality ( 11021 ) under some 
assumptions on measure X and a target box B^. We 
begin with definition of rate of distillability of a box B J 
from a box B, denoted as D(B^\B). 

Definition 8 For a box B £ B, consider a sequence A„ 
of operations satisfying the axioms (01) and (02), such 
that A n (B® n ) = B n . The set T> = {A n } is called a 
protocol distilling a target box B(( from B, if 

lim \\B n — [Bj]® fen || = 0. (103) 

n—> oo 

For a given distillation protocol T>, its rate is given by 

k 

r(D ) = lim sup —. (104) 

n—>-oo Ti 

The rate of distillability of the box B(( from a box B is 
given by: 


D{B((\B) = supr(D). (105) 

D 

We can proceed to show the main result of this sec¬ 
tion, which state that asymptotically continuous and 
monotonous measure of resource is, up to a constant 
factor, an upper bound on D(B((\B), as it is stated in 
proposition below. 


Proposition 3 Let B(( £ B v be some target box. Let X 
be a measure which satisfies the axioms of (M2) mono¬ 
tonicity, (Mf) asymptotic continuity, and also let X be 
superadditive on B((. Then we have: 

D(BC\B)X(B(;)<X°°(B). (106) 


Proof. Consider n copies of a box B. The purpose is 
to distill the largest number of (approximate) copies of 
target boxes B((. Let us fix 6 > 0. Then there exists a 
protocol V = {A n } such that r(T>) > D(B((\B) — S. It 
follows also that for sufficiently large n 

A n (B® n ) = B n , (107) 

such that there holds \\B n — B^ kn ||p < e„, where 0 < 
e n —> 0 with n —> oo. Then we have the following chain 
of inequalities: 

X{B® n ) > X(A n (B® n )) 

= X{B n ) 

>X{[B T v f kn )-f(e n ) 

> k n X(B(() - f(e n ), (108) 

where the first inequality holds by the axiom of (M2) 
monotonicity of X under operations A n satisfying the 
axiom (02). The first equality is by (11071) . the second 
inequality is by asymptotic continuity of X , where /(.) 
is some continuous function. The last equality is by the 
assumption of superadditivity of X on a box [B((]® krl 
(see in this context Theorem 9 of El). 

If we now divide the first and the last term of the above 
chain of (in)equalities by n , we have: 


ASTI > x(b t } A _ 1M 


(109) 


In fact, the left-hand side of the above inequality ap¬ 
proaches X 00 in the limit n —> oo and, by continuity of /, 
the right-hand side approaches X(B(()[D{B'^\B) — <J], as 
it was expected. Since 5 was arbitrary, taking the limit 
of c> —>■ 0 proves inequality (11061) . | 

We have now the following remark: 


Remark 1 If additionally a measure X is subadditive, 
there is X > X°°, hence by the above proposition, we 
obtain: 


X(B) > X{Bl)D{Bl\B). (110) 


B. Protocol of distillation of contextuality 

Below we consider a particular example of a distilla¬ 
tion protocol of contextual resources. We will consider a 
distillation protocol which uses two copies of weakly con¬ 
textual resource and transform them into a single copy 
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of more contextual resource. Wc will show this by a spe¬ 
cific indicator of contextuality /3 (which quantifies the 
violation of a specific contextual inequality) rather than 
a measure of contextuality X max which is uneasy to cal¬ 
culate for boxes used in the distillation process. The 
distillation protocol itself is easily implementable in the 
experiment, as it involves only postprocessing of the data 
(measurement outcomes). 

First let us describe in detail contextual resources from 
which we distill more valuable resource. 

An XOR-box is a consistent box {p(ai, 02 ,a m |ci)} 
such that for each input Ci and binary outputs 
£ {0,1}, either ( p od : odd context) 

p od (ai,..,a m |ci) = j* 
or ( p e : even context) 


Theorem 3 Let A : B v ® B v —> B v be a linear and non- 
contextuality preserving node wise operating map. There 
exist a map A such that 

/3 Bx (A(Bf 2 )) >/3 Bx (B v ), (115) 

where B v £ B v , and f3 Bx (B v ) = 2 m ~ 1 (B x \B v ), where B x 
is an extremal isotropic XOR-box. 

The theorem says that there exist a non-contextuality 
preserving map which can be used for distillation of con¬ 
textuality. Indeed, this result holds for any even number 
of copies of a box B v . Below we will prove the theo¬ 
rem for A being an XOR operation. Then we will show, 
that a node wise XOR operation is noncontextuality pre¬ 
serving, i.e., it cannot distill a contextual resource from 
noncontextual boxes B £ B nv . 

Proof. Let Axor be the node wise XOR operation, i.e., 
for any context Ci it acts as the following 


V ©" a 3 = 1 
otherwise, 


( 111 ) 


p e [a i,..,a m \ci) 


V 0™ aj = 0 
0 otherwise. 


( 112 ) 


Note that an XOR-box can in principle be contextual or 
noncontextual. For the rest of this section we assume 
that as an XOR-box we mean only contextual box, of 
which the examples are presented in Ref. 0 under the 
name of PM-box, M-box, CH-box. 

A correlated box is a box such that for each input Ci 
and binary outputs 01 ,..., a m £ {0,1} there is 


p(ai, ..,a' m \ci) A '^? r p(ai®a[, .. 

(116) 

Consider probability distributions, p od ,p e , which consti¬ 
tute a XOR-box. Let us check how Axor acts on different 
compositions of p od and p e : 

p e ®p e = p{®j l aj = 0|ci) ®p(©™a'- = 0|cj) 

A ^ R p(0f(a i 0a') = O|c i ) 

= P e , (H7) 


p(ai,..,a m |ci) 


2 ^ V©7*aj= 0 

0 otherwise. 


(113) 


where we used a simple identity (©™aj) © (®™a') = 
®™{ a j ©a}). Similarly one can show the following 


All correlated boxes which correspond to a hypergraph 
of a certain XOR-box are noncontextual: a correlated 
box can be obtained from a single joint probability dis¬ 
tribution p(A4 ) which is decomposable into probabilistic 
points with the property 0™Oj = 0 for all c,. 

The protocol of distillation that we will employ orig¬ 
inally was used in (20j for distilling nonlocal resources, 
whereas in terms of contextual resources it works as fol¬ 
lows: on inputs Cj for two copies of a resources one inputs 
the same number and then receives outputs (ai,...,a m ) 
and (a' 1; ..., a' m ), respectively. Then one compute the fi¬ 
nal output as (ai © a},..., a m © a' m ). This procedure we 
will call a node wise XOR operation. 

Furthermore, we will use the parameter /? (for a defini¬ 
tion see Ref. [16}) as a contextuality indicator of a given 
box, by means of violating the contextual inequality: if 
we denote B* as a reference extremal isotropic XOR-box, 
then for any contextual box B the contextual inequality 

p E *(B)<n- 1 (114) 

(n - the number of contexts) is violated. In the following 
theorem we will show that by performing XOR operation 
on two copies of a contextual box one can concentrate the 
contextuality content in terms of increasing the value /3. 


p e ®p odA ^ R p od 

p od ®p eA ^ R p od (118) 

p od ® p od Aj ^ r p e . 

Consider now a box B defined as a linear combination 
of the extremal isotropic XOR-box B x and a correlated 
box B c : 

B = aB x + (1 - a)B c . (119) 

It is easy to verify that 

PbAB) = 2 m -\B x \B) 

= 2 m ~ 1 (a(B x \B x ) + (1 - a)<B x |B c >) 

= (n — 1 + a), (120) 

since (B X \B X ) = n/2 m " 1 and (B X \B C ) = (n - l)/2 m “ 1 . 
We see that the contextual inequality (I114D is violated 
for any a £ (0,1], therefore B £ B v except a = 0. 

For two copies of the box B we have 

B® 2 = a 2 B® 2 + (1 - a) 2 Bf 2 

+ a(l - a){B x ® B c + B c ® B x ), (121) 
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and after node wise XOR operation 
Pb x (Axor(-B® 2 )) = 2 m ~ 1 (B x \Axor(B® 2 )) 

= 2 m ~ 1 (a 2 (B x \Axor(B® 2 )) + (1 - a) 2 (B x \AxoR(Bf 2 )) 

+a(l — cx)(B x \Axor(B x <S> B c + P c ® B x )2j. (122) 

Taking into account (11181) one can show that 

(B x \AxoR(Bf 2 )) = (n - l)(p e \p e ) + (p od \p e ) 

= (n — l)/2 m_1 , (123) 

(B x \A XO r(B? 2 )) = (n - l)(p e |p e > + <p°V> 

= (n— l)/2 m_1 , (124) 

and 

(B x | Axor (B x <g> B c + B c (g> B x )) 

= 2{n - 1 ){p e \p e ) + 2{p od \p od ) = 2n/2 m -\ (125) 

since for single-context probability vectors (p e \p e ) = 
l/2 m_1 and ( p od \p e ) = 0. 

Inserting these values into equation (11221) we get, 

Pb x (Axor(-B 02 )) = (a 2 + (1 - a) 2 )(n - 1) + 2a(l - a)n. 

(126) 

Then by comparing (11201) with (11261) one can see that for 
0 < a < 1/2 we obtain 

^(AxorOO) >Pb x (B). (127) 

I 

Note: we conjuncture that node wise XOR operation 
is the only operation which results in distillation. 

We will now show that node wise XOR operation is 
indeed noncontextuality preserving, i.e., it satisfies the 
axiom (02). Suppose that we aim to distill contextu- 
ality from noncontextual boxes B and B' . We need to 
show that the box B" = Axor (B ® B') is also noncon¬ 
textual. Now, since the box B = {p(a i, ...,a m |cj)} ( B' = 
{p'(a [,..., a' m \ci)}) is noncontextual, then there exist a 
joint probability distribution p(a\, 02 ,...) a' 2 , ■■■)) 

for all observables in A4. Denote A (A') as a string of 
2l- A/| l outputs (ai, 02 , ■■•) {{a[, a 2 ,...)), so that p(a 1 , 02 ,...) 
(p'(a'i, ■■•)) is a linear combination of deterministic 

points indexed by A (A'), each with probability p{ A) 
(p'(A')). Note that a node wise XOR operation is a map 
Axor : {A} x {A'} —> {A"}, where the string of outputs 
A" are defined by 

(a",a 2 ,...) = (ai ® a[,a 2 ® a' 2 , ■■■)■ (128) 

The box B" is then a linear combination of deterministic 
points indexed by A", each with probability 

p"{a'l, a 2 , ...) = P( a i, a 2 ,-)p'(a'i,a' 2 ,...), 

{A}x{A'}|® 

(129) 

where the above sum is over all composition of strings 
A and A' such that (1 1281) holds. For example in case of 
\M\ = 2 we would have e.g. for a string (o" = 0, a 2 = 1) 

p"( 01) (130) 

= P (ooy (01) + p(oiy (00) + p(ioy (n) + p(ny (10). 


Note also that p"(a",a 2 , ■■■) forms a well defined proba¬ 
bility distribution, because summing all probabilities 

^2 p"(a", a 2 ,...) = ^2 p(ai,a 2 ,-)p'{a' 1 ,a' 2 ,...) 

(A"} {A}x{A'} 

= ^2^2 p ( a 15 -)p'( a l> a 2> •■•) 

{A} {A'} 

= 1 - ( 131 ) 

The first equality is based on the observation that the 
inverse image of the map Axor f° r all elements in {A"} 
results in disjoint partitions of the entire product set 
{A} x {A'}. Thus we have shown that the box B" is 
noncontextual since there exist a joint probability distri¬ 
bution p" (a", a 2 ,...) which defines B". Similarly one can 
show that any node wise operation is also noncontextu¬ 
ality preserving. 

C. Towards application of Proposition l3l 

We can pass now to consider for which resources and 
measures the assumptions of the above proposition are 
satisfied. We consider bipartite scenario with non-local 
correlations as a resource, as that for contextuality fol¬ 
lows the same lines, and faces the same problems. 

1. (Possible bound via X max ) Consider B^ to be Pill¬ 
box, and a measure X to be X max . Then X max is 
additive on B x (see |l6}). By Theorem [2l X max 
is also asymptotically continuous. We can consider 
distillation protocol via restricted set of operations, 
namely the wirings [23| , as for suitably defined non¬ 
valuable, i.e., local boxes [32| (see easier formula¬ 
tion [33|) transforms local boxes into local ones, 
hence satisfy the axiom (02) (see Appendix [Cl) . 
However, one would need a proof that W max does 
not increase under wirings, which we leave as an 
open question. It is easy to check that for isotropic 
boxes PR a = aBooo + (1 — a)Booi (for a formal 
definition see 0) this bound would be nontrivial 
in the whole range of a € (3/4,1], 

2. (On bound via X u ) Similarly, as for X max , the mea¬ 
sure X u is additive on PP-boxes, and is asymptoti¬ 
cally continuous via proof analogous to that of The¬ 
orem [2] However, it is definitely not monotonous 
under general operations which satisfy the axiom 
(02), i.e., those that transform local boxes into 
local ones. This is because it can increase under 
partial trace. Indeed, consider a box with a hyper¬ 
graph G equal to a direct sum of two hypergraphs 
G\ ® G 2 , such that G 2 has two vertices connected 
by a single edge, and the context corresponding to 
this edge is locally with Alice. Let also G\ be a 
hypergraph of a non-local box. Let now the parties 
have a box B corresponding to G, which is PP-box, 
and a local box with Alice called L so that the box 
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B equals PR © L. By Theorem 8 of [l(| there is 

X U (PR © L) = | X U (PR ) + ix u (L). (132) 

Now, since X U (L) = 0 as this box is local, we have 
that X U (PR © L) < X U (PR ), hence, by remov¬ 
ing/adding L one can increase/decrease the value 
of X u . Let us note here, that X max does not suffer 
from the same problem, as 


X max (PR © L) = max{X max (■ PR),x max (L)} 

= X max (PR), (133) 


so that adding or removing a local box does not 
change value of X max . Despite the fact that X u 
is not monotonous under locality preserving oper¬ 
ations, monotonicity under wirings is still possible 
for it, which we also leave as an open problem. 


It is worth mentioning that while considering the mea¬ 
sure of contextuality X u we observe that it is a nor¬ 
malized version of nonlocality quantifier as referred in 
Ref. [TTj| . Notice, however, that although the unnor¬ 
malized statistical distance measure of nonlocality, given 
by infimum over local distributions may increase under 
local transformations (in particular enlarging the num¬ 
ber of inputs of a box 11]), it is not necessarily so when 
the number of added new inputs are properly accounted. 
Thus, a normalized measure of contextuality (X u as well 
as X max ) prevents the increase of relative entropy while 
trivial expansion of the number of contexts takes place. 


VII. CONCLUSIONS 

Using an axiomatic approach common to resource the¬ 
ories, we have developed the theories of contextuality, 
and its most celebrated example, which is non-locality. 
Crucially from the experimental point of view, we have 
studied the axiom of asymptotic continuity, and proved 
that recently established measure of contextuality — the 
relative entropy of contextuality [l6j obeys that axiom. 
We thereby have showed that for an experimental setup 
which produces an imperfect box B' , close to the in¬ 
tended box B 1 the amount of contextuality measured by 
the relative entropy of contextuality X(B') cannot differ 
from X(B) by more than the distance \\B — B'\\ with a 
factor depending only logarithmically on the dimension 
of the boxes. 

We have also considered a general measure of re¬ 
source X , with properties satisfying three proposed ax¬ 
ioms/properties: of faithfulness, local invariance and con¬ 
vexity. We have focused on boxes B from the polytope 
satisfying vertex equivalence property, i.e., which is such 
that all its contextual vertices are reversibly exchange¬ 
able into each other. We have shown that in such poly¬ 
topes the measure X is upper bounded by the measure 


called the cost of the resource C(B) with a multiplica¬ 
tion factor X(E V ) for some extremal valuable box E v . 
Interestingly, due to this factor, we were able to bound 
an extensive measure (which grows linearly with number 
of copies), by a non-extensive one (which takes values in 
[0,1] on any box irrespective of its dimension). The men¬ 
tioned bound is linear function of the box. It would be 
interesting to find a non-linear one, which is more tight 
and still easily computable. We have supported the latter 
results by two examples of its application: for bipartite 
boxes with binary inputs and outputs, as well as for the 
boxes related to contextual chain box. Analogous, but 
weaker upper bound holds in the case of the polytopes 
B , which do not satisfy the vertex equivalence property 
of B. 

We have studied a distillation protocol of a valuable 
target box Bff from many copies of some input boxes 
B , and in full analogy with theory of entanglement mea¬ 
sures, we have provided an upper bound on the rate of 
distillability of the resource D(Bff\B). It is expressed 
by a measure of resource X which satisfies another two 
proposed axioms: of monotonicity under allowed class 
of operations, asymptotic continuity and superadditivity 
on target boxes: X({Bff]® k ) > kX(Bf). From our in¬ 
vestigation we can conclude, that the relative entropy of 
contextuality for bipartite boxes with two binary inputs 
and outputs may be an upper bound on distillable non¬ 
locality in the form of the Popescu-Rohrlich boxes. The 
only fact which needs to hold for the latter to be true, 
is the non-increasing of this measure under wirings. We 
leave this remaining question as an open problem. 

Finally, checking whether other measures of contextu¬ 
ality or non-locality such as, e.g., j37:,l3Sj satisfy proposed 
axioms, would be vital for their further use, and it would 
be also interesting to find new ones which satisfy the ax¬ 
ioms by definition. 

Note added: While finishing this manuscript, we be¬ 
came aware of the results of [3Q|. It seems that our re¬ 
sults can be set in a more general framework of general 
resource theories formulated there in more abstract lan¬ 
guage. 
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Appendix A: Formal proof of Lemma [l] 

Here we give the formal proof of Lemma [1] 
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Proof. By definition of infimum, for any 5 n > 0 there 
exists a sequence p n such that 

ft : = inf f(p) < f(Pn) < It + S n- (Al) 

p&t 

Moreover, by the assumption (l47l) there exists a sequence 
a n , such that 

f(Pn) - g(S) < /(ct n) < / (.Pn) + g(S). (A2) 

Combining the above two sequences of inequalities we 
obtain: 

ft — 9(6) /(°n) < ft + S n + g(6)- (A3) 

Now, there exists no such that for every n > no there 
holds S > S n , and hence 

f*-g(6)-6<f(a n )<ft+5 + g(5). (A4) 

This means that we obtained a sequence /(<r n ) which 
is bounded (we use here the fact that the infima are 
bounded), and by Bolzano-Weierstrass theorem there ex¬ 
ists a subsequence nk, such that f{cr nk ) has a limit. Thus 
we have in particular: 

lim f(<J nk ) ~ ft < 9(6) + A (A5) 

n—> oo 

Since by definition ff, := inf T > f(cr) = lim^oo f(cr ni ) 
for some sequence { a ni }, we have from the above inequal¬ 
ity, that {<7 nfe } may be suboptimal (the infimum over a 
set is the infimum of the set of limits of sequences from 
this set), hence 

ft. - ft < 9(6) + 6. (A6) 

Analogously, exchanging T and T' we can arrive at 

ft-ft. <9(6)+ 6, (A7) 

which proves the thesis for infima. The proof for supre- 
mum goes analogously, with only change of inequalities 
to opposite and signs in front of S n in (ED, which leads 
us exactly to the expression (1A4I) , but for the supremum. 
The rest of the proof goes symmetrically, hence we skip 
it.| 

Appendix B: Convexity of J m ax 

In this section we will present an explicit proof of an¬ 
other property of the measure of contextuality, which is 
its convexity. This property was used in Ref. [27j | (see Eq. 
(2)), but without formal proof. We will first prove con¬ 
vexity of I p ( c ) and then using the definition of supremum 
we will show convexity of the measure J max . Note, that 
the convexity of the mutual information of contextual¬ 
ity / max means that the relative entropy of contextuality 
Xmax is also convex because of equivalence of the two 
measures. 


Let us denote B m ; x as a convex combination of boxes: 

-^mix — E PiB iy (Bl) 

i 

where Bi are not necessarily extremal (or deterministic) 
boxes. Then, by definition of a box we have: 

PSmix(A c ) =J2 PiPB '( Xc )- ( B2 ) 

i 

We now have the following: 

Ap(c) ( ^ ( PiBi ) 

i 

— /p(c) (-^mix) 
p ( A) c 

< J2 p ( c ) D ( pB ^c)\\ E*AP"( A c))- (B3) 

c i 

where p l *{ A c ) is a marginal distribution obtained from a 
joint probability distribution p l *( A) optimal for a partic¬ 
ular box Bi. The above inequality comes from the fact 
that the distribution p*{ A) = "ft.- Pip 1 * (A) not necessar¬ 
ily gives a desired minimum over all distributions p( A). 
Furthermore, we have 

Y J p(o)D(Y, p i p BM\\ Y,PiP l *( x *)) 

c i i 

- E Pi E P( C ) D (PBi (Ac) I |p” (Ac)) 

i c 

= E pi/ p(c)( Si )’ ( B4 ) 

i 

where the inequality comes from joint convexity of rela¬ 
tive entropy distance for each c, while for the last equal¬ 
ity we utilized the optimality of p l *{ A) for each box Bi. 
Using the results (1B3I) and (IB4I) we arrive at 

4(<o (E^- 8 *) - E^pmC 8 *)- ( B5 ) 

i i 

Now, by the definition of supremum, for any S n > 0 
there exists a distribution p n (c), such that 

Anax(y^Pi-Bi) < I pni c)$2piBi) + S n , (B6) 

i i 

hence, by convexity of I Pn ( c ) we have 

/max ^ PiBi ) ^ ^ ( PiIp rl (c)( Bj ) T 5 n . (B7) 

i i 

Notice that for each i the definition of / ma x assures that 
I Pn(c)(B i ) < I max (Bi). Thus 

wE PiBi') ^ ^ ^ PjIma,x(Bj) -f- $ n , (B8) 

i i 

and because S n can be arbitrarily small, we obtain the 
desired convexity of I m ax . 
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Appendix C: Locally performed wirings satisfies the 
axiom (02) 

In this section we present a formal proof of the fact 
that if Alice and Bob have access to n boxes, such that 
the collection of the latter admits a local hidden vari¬ 
able model, then by means of locally performed wirings 
HU one cannot transform the collection of boxes into a 
valuable (nonlocal) resource shared by the two parties. 

Consider then a collection of n boxes shared by Alice 
and Bob which admits a local hidden variable model with 
respect to both parties 

Bl = ^]pAP (A) ( a l x ) ®p (A) (b|y), (Cl) 

A 

for some probability distribution {pa} ) where a = 
(ai,...,a m ) (b = (6i,...,6 m )) is the vector of Alice’s 
(Bob’s) outputs when one of the input from x = 
(xi,...,x n ) (y = ( 2 / 1 , ...,2/n)) is chosen. 

We will assume, that locally the distribution of the 
collection of boxes as seen by one party (e.g. Alice), 
p(a|x), is nonsignalling [33j . i.e. the following conditions 
are satisfied 

^,Xi,x[ ^ ^ p(a|x^ , Xj) ^ ' jl(a|x^ , Xj ) , 

di di 

(C2) 


and analogously for Bob. Note that the nonsignalling 
conditions given above implies nonsignalling with respect 
to all subsets of inputs, i.e., marginal distribution of the 
outputs a^ l, A-- does no t depend on changing the inputs 

x^’i,- [35J]. 


Consider now the partition of constituent boxes 
Aj_ : A 2 : B , where A ± = {x 1; ..., x k }, A 2 = 

{x k+1 ,...,x n },B = { 2 / 1 , ...,2/n} for an arbitrary 1 < k < 
n — 1. As it was shown in Ref. [34|, the locality in the 
partition A \, A 2 : B may not be preserved when the sub¬ 
systems Ai and A 2 cooperate, i.e., when they perform a 
suitable wiring. This happens when no constraints are 
imposed on the distribution p( A )(a|x) in the decomposi¬ 
tion m- However, when the local distribution admits 
nonsignaling conditions , then the operation of wiring of 
the subsystems Ai and A 2 will not lead to emergence of 
signaling for the one-partite distribution p(a|x). Since 
such nonsignalling bilocal distributions (NSBL) consti¬ 
tute a closed set under wirings (34J, then we see that 
locally performed the operation of wirings will not pro¬ 
duce a valuable (nonlocal) resource from useless (local) 
objects (see Supplemental Material of Ref. [34j . where 
the nonsignalling conditions (C2) need to be assumed). 
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